Search CORE

104 research outputs found

A conditional compression distance that unveils insights of the genomic evolution

Author: Pinho Armando J.
Pratas Diogo
Publication venue
Publication date: 16/01/2014
Field of study

We describe a compression-based distance for genomic sequences. Instead of using the usual conjoint information content, as in the classical Normalized Compression Distance (NCD), it uses the conditional information content. To compute this Normalized Conditional Compression Distance (NCCD), we need a normal conditional compressor, that we built using a mixture of static and dynamic finite-context models. Using this approach, we measured chromosomal distances between Hominidae primates and also between Muroidea (rat and mouse), observing several insights of evolution that so far have not been reported in the literature.Comment: Full version of DCC 2014 paper "A conditional compression distance that unveils insights of the genomic evolution

arXiv.org e-Print Archive

Crossref

Information profiles for DNA pattern discovery

Author: Ferreira Paulo J. S. G.
Pinho Armando J.
Pratas Diogo
Publication venue
Publication date: 19/01/2014
Field of study

Finite-context modeling is a powerful tool for compressing and hence for representing DNA sequences. We describe an algorithm to detect genomic regularities, within a blind discovery strategy. The algorithm uses information profiles built using suitable combinations of finite-context models. We used the genome of the fission yeast Schizosaccharomyces pombe strain 972 h- for illustration, unveilling locations of low information content, which are usually associated with DNA regions of potential biological interest.Comment: Full version of DCC 2014 paper "Information profiles for DNA pattern discovery

arXiv.org e-Print Archive

Crossref

Histogram packing, total variation, and lossless image compression

Author: Ferreira P.J.S.G.
Pinho Armando J.
Publication venue
Publication date
Field of study

Publication in the conference proceedings of EUSIPCO, Toulouse, France, 200

ZENODO

Compression of Microarray Images

Author: Antonio J. R. Neves
Armando J. Pinho
Publication venue: 'IntechOpen'
Publication date: 01/03/2010
Field of study

IntechOpen

Smash plus plus : an alignment-free and memory-efficient tool to find genomic rearrangements

Author: Hosseini Morteza
Morgenstern Burkhard
Pinho Armando J.
Pratas Diogo
Publication venue
Publication date: 01/05/2020
Field of study

Background: The development of high-throughput sequencing technologies and, as its result, the production of huge volumes of genomic data, has accelerated biological and medical research and discovery. Study on genomic rearrangements is crucial owing to their role in chromosomal evolution, genetic disorders, and cancer. Results: We present Smash++, an alignment-free and memory-efficient tool to find and visualize small- and large-scale genomic rearrangements between 2 DNA sequences. This computational solution extracts information contents of the 2 sequences, exploiting a data compression technique to find rearrangements. We also present Smash++ visualizer, a tool that allows the visualization of the detected rearrangements along with their self- and relative complexity, by generating an SVG (Scalable Vector Graphics) image. Conclusions: Tested on several synthetic and real DNA sequences from bacteria, fungi, Aves, and Mammalia, the proposed tool was able to accurately find genomic rearrangements. The detected regions were in accordance with previous studies, which took alignment-based approaches or performed FISH (fluorescence in situ hybridization) analysis. The maximum peak memory usage among all experiments was similar to 1 GB, which makes Smash++ feasible to run on present-day standard computers.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Competitive Segmentation Performance on Near-lossless and Lossy Compressed Remote Sensing Images

Author: García-Sobrino Joaquín
Pinho Armando J.
Serra-Sagristà Joan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Image segmentation lies at the heart of multiple image processing chains, and achieving accurate segmentation is of utmost importance as it impacts later processing. Image segmentation has recently gained interest in the field of remote sensing, mostly due to the widespread availability of remote sensing data. This increased availability poses the problem of transmitting and storing large volumes of data. Compression is a common strategy to alleviate this problem. However, lossy or near-lossless compression prevents a perfect reconstruction of the recovered data. This letter investigates the image segmentation performance in data reconstructed after a near-lossless or a lossy compression. Two image segmentation algorithms and two compression standards are evaluated on data from sev- eral instruments. Experimental results reveal that segmentation performance over previously near-lossless and lossy compressed images is not markedly reduced at low and moderate compression ratios. In some scenarios, accurate segmentation performance can be achieved even for high compression ratios

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Diposit Digital de Documents de la UAB

Lossy-to-Lossless Compression of Biomedical Images Based on Image Decomposition

Author: Matos Luís M. O.
Neves António J. R.
Pinho Armando J.
Publication venue: 'IntechOpen'
Publication date: 28/10/2015
Field of study

The use of medical imaging has increased in the last years, especially with magnetic resonance imaging (MRI) and computed tomography (CT). Microarray imaging and images that can be extracted from RNA interference (RNAi) experiments also play an important role for large-scale gene sequence and gene expression analysis, allowing the study of gene function, regulation, and interaction across a large number of genes and even across an entire genome. These types of medical image modalities produce huge amounts of data that, for several reasons, need to be stored or transmitted at the highest possible fidelity between various hospitals, medical organizations, or research units

IntechOpen

Crossref

A Reference-Free Lossless Compression Algorithm for DNA Sequences Using a Competitive Prediction of Two Classes of Weighted Models

Author: Hosseini Morteza
Pinho Armando J.
Pratas Diogo
Silva Jorge M.
Publication venue
Publication date: 01/11/2019
Field of study

The development of efficient data compressors for DNA sequences is crucial not only for reducing the storage and the bandwidth for transmission, but also for analysis purposes. In particular, the development of improved compression models directly influences the outcome of anthropological and biomedical compression-based methods. In this paper, we describe a new lossless compressor with improved compression capabilities for DNA sequences representing different domains and kingdoms. The reference-free method uses a competitive prediction model to estimate, for each symbol, the best class of models to be used before applying arithmetic encoding. There are two classes of models: weighted context models (including substitutional tolerant context models) and weighted stochastic repeat models. Both classes of models use specific sub-programs to handle inverted repeats efficiently. The results show that the proposed method attains a higher compression ratio than state-of-the-art approaches, on a balanced and diverse benchmark, using a competitive level of computational resources. An efficient implementation of the method is publicly available, under the GPLv3 license.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Dissimilar Symmetric Word Pairs in the Human Genome

Author: Afreixo Vera
Bastos Carlos A. C.
Brito Paula
Pinho Armando
Raymaekers Jakob
Rousseeuw Peter J.
Silva Raquel M.
Tavares Ana Helena
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

In this work we explore the dissimilarity between symmetric word pairs, by comparing the inter-word distance distribution of a word to that of its reversed complement. We propose a new measure of dissimilarity between such distributions. Since symmetric pairs with different patterns could point to evolutionary features, we search for the pairs with the most dissimilar behaviour. We focus our study on the complete human genome and its repeat-masked version.Comment: Submitted 13-Feb-2017; accepted, after a minor revision, 17-Mar-2017; 11th International Conference on Practical Applications of Computational Biology & Bioinformatics, PACBB 2017, Porto, Portugal, 21-23 June, 201

arXiv.org e-Print Archive

Maastricht University Research Portal

Crossref